In this paper, a novel method to reduce additive time-varying noise is proposed. Unlike the previous methods, the proposed method requires neither the assumption about noise nor the estimate of the noise statistics from any pause regions. The enhancement is performed on a band-by-band basis for each time frame. Based on both the decision on whether a particular band in a frame is speech or noise dominant and the masking property of the human auditory system, an appropriate amount of noise is reduced in time-frequency domain using modified spectral subtraction. The proposed method was tested on various noisy conditions: car noise, F16 noise, white Gaussian noise, pink noise, tank noise and babble noise. On the basis of segmental SNR, inspection of spectrograms and MOS tests, the proposed method was found to be more effective than spectral subtraction with and without pause detection in reducing noise while minimizing distortion to speech.
Junpei YAMAUCHI Tetsuya SHIMAMURA
This paper presents an improved spectral subtraction method for speech enhancement. A new noise estimation method is derived in which the noise is assumed to be white. By using the property that a white noise spectrum is flat, high frequency components of a noisy speech spectrum are averaged and the standard deviation of the noise is estimated. This operation is performed in the analysis segment, thus the spectral subtraction method combined with the new noise estimation method does not need non-speech segments and as a result can adapt to non-stationary noise conditions. The effectiveness of the proposed spectral subtraction method is confirmed by experiments.
Hiroshi SARUWATARI Shoji KAJITA Kazuya TAKEDA Fumitada ITAKURA
This paper describes an improved complementary beamforming microphone array based on the new noise adaptation algorithm. Complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this system, during a pause in the target speech, two directivity patterns of the beamformers are adapted to the noise directions of arrival so that the expectation values of each noise power spectrum are minimized in the array output. Using this technique, we can realize the directional nulls for each noise even when the number of sound sources exceeds that of microphones. To evaluate the effectiveness, speech enhancement experiments and speech recognition experiments are performed based on computer simulations with a two-element array and three sound sources under various noise conditions. In comparison with the conventional adaptive beamformer and the conventional spectral subtraction method cascaded with the adaptive beamformer, it is shown that (1) the proposed array improves the signal-to-noise ratio (SNR) of degraded speech by more than 6 dB when the interfering noise is two speakers with the input SNR of below 0 dB, (2) the proposed array improves the SNR by about 2 dB when the interfering noise is bubble noise, and (3) an improvement in the recognition rate of more than 18% is obtained when the interfering noise is two speakers or two overlapped signals of some speakers under the condition that the input SNR is 10 dB.
Hiroshi SARUWATARI Shoji KAJITA Kazuya TAKEDA Fumitada ITAKURA
This paper describes a spatial spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this paper, it is shown that the nonlinear subtraction processing with complementary beamforming can result in a kind of the spectral subtraction without the need for speech pause detection. In addition, the optimization algorithm for the directivity pattern is also described. To evaluate the effectiveness, speech enhancement experiments and speech recognition experiments are performed based on computer simulations under both stationary and nonstationary noise conditions. In comparison with the optimized conventional delay-and-sum (DS) array, it is shown that: (1) the proposed array improves the signal-to-noise ratio (SNR) of degraded speech by about 2 dB and performs more than 20% better in word recognition rates under the conditions that the white Gaussian noise with the input SNR of -5 or -10 dB is used, (2) the proposed array performs more than 5% better in word recognition rates under the nonstationary noise conditions. Also, it is shown that these improvements of the proposed array are same as or superior to those of the conventional spectral subtraction method cascaded with the DS array.
Hack-Yoon KIM Futoshi ASANO Yoiti SUZUKI Toshio SONE
In this paper, a new spectral subtraction technique with two microphone inputs is proposed. In conventional spectral subtraction using a single microphone, the averaged noise spectrum is subtracted from the observed short-time input spectrum. This results in reduction of mean value of noise spectrum only, the component varying around the mean value remaining intact. In the method proposed in this paper, the short-time noise spectrum excluding the speech component is estimated by introducing the blocking matrix used in the Griffiths-Jim-type adaptive beamformer with two microphone inputs, combined with the spectral compensation technique. By subtracting the estimated short-time noise spectrum from the input spectrum, not only the mean value of the noise spectrum but also the component varying around the mean value can be reduced. This method can be interpreted as a partial construction of the adaptive beamformer where only the amplitude of the short-time noise spectrum is estimated, while the adaptive beamformer is equivalent to the estimator of the complex short-time noise spectrum. By limiting the estimation to the amplitude spectrum, the proposed system achieves better performance than the adaptive beamformer in the case when the number of sound sources exceeds the number of microphones.